Quantum computational gradient estimation 

David W. Bulger* 
February 1, 2008 



Abstract 

Classically, determining the gradient of a black-box function f -.MP ^ M. requires 
p + 1 evaluations. Using the quantum Fourier transform, two evaluations suffice. 
This is based on the approximate local periodicity of e'^'^'^^f(^) , It is shown that 
sufficiently precise machine arithmetic results in gradient estimates of any required 
accuracy. 

MSC2000 Subject Classification: 90C30, 68Q99, 68Q25 

Key words and phrases: quantum computation, gradient estimation. 

1 Introduction 

The vector gradient of a real- valued function / of a vector argument can be calculated 
using just two calls to a black-box quantum oracle for /. The mechanism is simple, 
and capitalises on the fact that, in the vicinity of a point periodic, with 

period parallel and inversely proportional to S/f{x). A superposed state is created 
discretising a small hyperrectangle around the domain point, the function is evaluated, 
the phase is rotated in proportion to the function value, the oracle call is reversed, 
and a multidimensional quantum Fourier transform is applied to the bits encoding the 
discretised hyperrectangle. 

This paper establishes, under mild conditions on /, that the gradient estimation can 
be performed to any required level of accuracy, in the sense that, given any 5 > and 
e < 1, we can produce a superposition of gradient estimates which, if observed, will 
collapse to an estimate within 6 of the true gradient with probability at least e. Greater 
accuracy is achieved by increasing arithmetic precision and by increasing the number of 
points in the sampling grid. 

The paper's structure is as follows. Section [21 presents some assumptions on the 
function / whose gradient is sought, and Section |21 formalises the evaluation and ma- 
nipulation of values of / within the quantum computer. Section |1] presents the gradient 
estimation algorithm. Section [S] analyses the effect of the algorithm, consisting mostly 
of the statement and proof of the main result, that any required accuracy is attainable 
by using sufficiently precise arithmetic and a large enough sampling grid. Because the 
rest of paper discusses the computations rather abstractly. Section El briefly comments 
on how the algorithm would be performed in practice. 
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2 Problem formulation 



Let D cMP have non-empty interior. Let / be a twice-difFerentiable function from D to 
R. At each point x £ D, let V/(x) denote the gradient and Hf{x) the Hessian matrix 
of /. Assume that ||V/(x)||oo < L and \\Hf{x)\\2 < M for ah x e D. It is desired to 
determine V/(x) for a point x in the interior of D. 

3 Oracle Formalism 

This paper's main result is that an objective function's gradient can be calculated to any 
desired precision using the quantum algorithm described. Clearly, any particular data 
encoding method will only support a certain maximum precision; we therefore require a 
formalism in which points in the domain and range of / can be represented in a variety 
of ways. This section introduces the Hilbert spaces and operators involved. 

It will be helpful firstly to catalog the operators to be used as they would look if 
precision and rounding error were not relevant. The computational system is a tripartite 
system; the three parts have state spaces D, TZ and Q (standing for 'domain', 'range' and 
'grid'), so the combined system has state space V TZ Q. Each computational basis 
state is a tensor product \d) (g) |r) \g) of one computational basis state from each of the 
three factor spaces. For now, suppose that the basis indices d, r and g belong respectively 
to D, M and W. Let go be the constant length-p vector (2^*-^ - 1/2, . . . ,2"-^ - 1/2). 
The operators are 



where fj, and A are algorithm parameters, as well as the quantum Fourier transform and 



Return now to precision considerations. Suppose that, for any positive i/ and fx 
and natural n, we can construct a quantum oracle evaluating f{x + iJ,{g — go)) for g G 
{0, . . . , 2" — 1}^* and x,x + fi{g — go)E:D, with an error uniformly bounded by v. In par- 
ticular, suppose that we can construct a system (Uf, U+,V, Bj), TZ, B-jz, Q, Cd,Cr,Cf, Cp), 
where 

• V,Tl and Q are finite-dimensional Hilbert spaces, 

• B-D and Bti are orthonormal bases for V and TZ, 

• Bg = {0, . . . , 2" — 1}P is an orthonormal basis for Q, 

• B-ji is a group, under an operation we denote as '+', with an identity element we 

denote as |0), 

• Cd '■ D ^ Bx> (the 'domain encoding function'), c/ : Bx> — >■ B-ji, Cr : B-ji — >■ R (the 
'range decoding function'), and Cp : Bx> x Bg Bx>, 



Uf{\d) ® \r) ® \g)) 
U+{\d) ® \r) ® \g)) 
UR{\d) (g) |r) ® \g)) 



\d)®\T + f{d))®\g), 

\d + i^{g- go)) ® \r) (8) \g) 
e'^^i^f(d)\d)(^\r)(^\g), 



inverses of Uf and C/+. 



• Uf and C/+ are unitary operators on P 7^ ^, given by 



Uf\d) (g) |r) (g) \g) = \d) ® \r + Cf{d)) (g \g) 
U+\d)® \r) (g 1^) = |cp(d, g)) ®\r)®\g), 
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• Cp acts invertibly on i?©, so that for each d G Bj) and g G Bg, Cp ^{cp{d, g),g) = d, 

• 1/(2;) — CroCfoCd{x)\ < 1^/2 for all x £ D, 

• |croC/oCd(x + - go)) - CroCfoCp{cd{x),g)\ < vjl for ah X G and 5 G Bg, 
provided x + fj,{g — go) is also in D, 

and, further, that we can implement Uf and C/+ on a quantum computer. (Note that 
this formalism does not necessarily require the domain points represented by Bj) to form 
a grid; this may be of interest in optimising functions on manifolds.) 

4 Algorithm 

The algorithm dealt with in this paper estimates the gradient of / at a point x in the 
interior of its domain. Firstly, using quantum superposition, / is evaluated at every 
point of a hyperrectangular grid centred around x. The grid is small enough that / is 
approximately linear across it. Next, the phase of the quantum computational system 
is rotated in proportion to the value of / at each grid point. Now the phase varies 
approximately periodically over the grid, and the period determines V/(j;). The period 
is easily determined by the quantum Fourier transform. 

Two of the operators involved in the gradient estimation algorithm, Uj and [/+, were 
hypothesised in Section EJ Additionally, we will require their inverses UJ^ and a 
phase rotation operator Ur, and a p-dimensional quantum Fourier transform Uqft- 

The operators UJ^ and U^^ invert the actions of C// and ?7+, mapping \d) ® \r) ® \g) 
to \d) ® \r — Cf{d)) \g) (the subtraction r — Cf{d) is according to the group structure 
assumed on Bji) and \cp^{d,g)) ® \r) \g). Note that the function / is not being 
inverted. 

The phase rotation operator Ur involves a parameter A G M, mapping \d) (8) \r) (8) l^) 
to e^'^^^'^^^^^d) \ r) \g) . The multidimensional quantum Fourier transform Uqft acts 
on g, mapping \d) ® |r) \g) to 2-^"/^ Y^heBg e^'"'^''^''^" \d) \r) \h) . 

With these operators defined, the gradient estimation algorithm is easily stated. 
Firstly, the state \cii{x)) (8 |0) (8) |0) is prepared inTX^lZ^Q^ where x £ D \s the point at 
which V/ is sought. Then, the system V®TZ®Q is subjected to UqftoU^^ oUj^ oUroU joUj^oUqet- 
This results, as we shall see in Section |21 in a state |crf(x)) ® |0) (8 Ix)) where in general 
Ix) is a superposition of computational basis states from Bg. 

Interpretation of the resulting state |x) involves the "gradient decoding function" 
Cg-.Bg^W, defined by 

Cg : (51, • • • ,ffp) (c9,i(ffi),---,Cg,p(5rp)), where 

^ . . _ / 5n.G{0,... ,2-1-1}, 

2" A/1 ' y'Ti fc ) • • • ) ^ ^S- 

If Ix) is a basis state \g), it indicates that Vf{x) = Cg{g). If, on the other hand, |x) is a 
superposition Y2g£Bg Xgld)^ then the gradient estimate is indeterminate, comprising the 
various discretised values g with the weights |XgP- 

Altogether, in addition to the argument x, the gradient estimation algorithm depends 
on the four parameters n, v, A and ^. Accordingly, the algorithm will be denoted 
A{n, I', A, /i; x). 
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5 Behaviour 

The state resulting from the algorithm A{n, v, A, fi; x) is 

UQFToU^\U^\URoUfoU+oUQFT{\cd{x)) |0) |0)) 

= 2-f'^/2 ^ UQFToU^\Uj\URoUfM+{\cd{x)) ® |0) ® \h)) 
heBg 

= 2-P"/2 J2 UQFToU^\Uj\URoUf{\Cp{Cd{x),h)) ® |0) \h)) 
heBg 

= 2-f"/2 ^ UQFToU+\Uj\UR{\Cp{cd{x),h)) ® \cfoCp{cd{x),h)) ® \h)) 
heBg 

= UQFT\cd{x))®\Qi)\'^) 

= \cd{x))®\Q)®\x). 
where 

1^) = 2-J'"/2 ^2-,i\CroCfoCp{ca{x),h)^^S^ 
heBg 

\X) = C/qftIV'), (2) 
and of course Uqft, acting on Q alone, is defined by 

heBg 

Theorem 1 For any 7, 5 > and e < 1, there exist parameters n, v, X and ji such that, 
at every x with x + [—7, 7]*^ C D, when \x) is produced according to A{n, v, A, ji; x), 

Il^'lx)ll2>e, (3) 

where P is the projection 

Y,{\h){h\ : \\cg{h)-Vf{x)U<5}. 



Proof It will be demonstrated that, if n, A and /z are chosen satisfying 



4"-VAM/x2/V5 < (l-e)/3, (4) 

2tt\v < (l-e)/3, (5) 

2"-V < 7, (6) 

1/2A^ > L + dand (7) 

csc{7tXh6) < ^2"(l-((2 + e)/3)2/p), (8) 
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then (j3)) holds. The reader can verify that one choice satisfying Q) to © is 

- log2 (sm'{7T6/2{L + 6)) (l - ((2 + e)/3)2/f ' 



n 



X = max 



3 X 4"-VM 



.7(L + 5)' V5(L + 5)2(l-e)j ' 
/X = l/2A(L + 5), 
zy = (l-e)/67rA. 

The algorithm contains three sources of error: 

• V/(x) will not, in general, be exactly equal to Cg{g) for some g G Bg; 

• V/ will not, in general, be exactly constant throughout the sampling grid; 

• calculations are performed to a finite precision, so that CroC/oCp(cd(x), h) will not, 
in general, exactly equal f{x + fi{h — go))- 

In the oracle's calculation of f{x + — go)), let enix, h) represent the error due to 
computational precision and let eAr(x,/i) represent the departure from linearity, so that 

CroCfoCp{cd{x), h) = f{x+n{h-go))+eDix, h) = f{x)+V f{x)-n{h-go)+eNix, h)+eD{x, h). 

('D' stands for 'discretisation' and 'N' for 'nonlinear'.) By ©, x + fi{h — go) G D. By 
assumption, |e£)(x,/i)| < v, and by Lagrange's remainder for Taylor's series, we have 
\eN(.x, h)\ < Mfi^ih - go) ■ {h - go)/2. 

We wish to bound the effects of the three sources of error separately; therefore it 
will be convenient to write as the sum \'4>l) + IV'A^) + IV'-d), where 



2-pn/2 ^ g27riA(/(x)+AiV/{a;)-{/i~go))|/j^^ 



2" pn/2 



-pn/2 



heBg 



,2niXfix+fiih-go}) _ 27riA(/(x)+/xV/(x)-(/i-so)) 



\h) 



,2TTiXf(x+fi{h~go)) _ 2TviX(f(x+fi(h-go))-eN{x,h)) 



\h), 



heBg 

2-pn/2 ^^2niXcroCfoCp{cd{x),h) _ ^2TviXf(x+n(h~go)) 



\h) 



-pn/2 



heBg 
heBg 



,2iTiX{f{x+fi{h-go))+eu{x,h)) _ p27riA/(a;+/i(/i-go)) 



\h). 



Noting that, for any real a and /?, 



we have 



'N) 



^2niX(a+/3) _ ^2niXa^ ^ 2| sinvrA/?] < 27rA|/3|, 



I2 < J2(''^^^^''(^-9o)-ih-9o))' 

V heBg 



1 



80 24 12 240 12 x 2*^ 



< 4"-VAM^iVV5 

< (l-e)/3 
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by ©, and 



|||V'd)||2 < 2-P"/2 / ^ (27rAi/)2 = 2-P"/2 x 27TXuy^\ = 2t:\v < (1 - e)/3 
by ©. 

Next we consider the error introduced by 'frequency leakage'. If the components of 
V/(x) are integer multiples of 1/2" A/i, then Uqft\'4^l) is equal to a computational basis 
state, identifying V f{x) exactly. In the general case, we obtain instead a superposition, 
which strongly weights computational basis states representing gradients close to V/(a;). 
We have 



Uqft\iPi 



2-P" g27ri{9-V2"+A(/(x)+MV/(x)-{h-so)))|j^ 



g£Bg h£Bg 
,27Ti\{f{x)-t^\7f{x)-go) 



\<j3m), where 



m=l 



2"-l 2"-l 



10.) = 2'-Y E "^''"^^^''^^i^^™) 

gm=0 hm=0 

The factors {(pm) are state vectors of unit magnitude, and note that 



\{gm\<Pm)\ < 



)l~ra 



CSC ( vr ( — + A^— 

Z OXui 



CSC ttA^ C(,,m(5 



< 2-"|csc(^A/i5)| 



(9) 
(10) 



whenever \cg^rn{hm) — df{x)/dxm\ > S; we obtain Q because XfJ-Cg^migm) + gm/^^ is 
always an integer, and | esc | is even and has period vr; we obtain (fTfl|) due to the shape 
of the cosecant function and because, by ©, 

T^Xl^{Cg,m{gm) - df{x)/Xm) G ("VT + 7rA^5, TT - 7rA^(5). 

Note that the projection P can be written as Pi (8) • • • P^, where 



2"-l J- 



<5 . 



Then 



||i^f/QFT|V'L)||2 



„27riA(/(z)-MV/{x)-go) 



n ll^m|</'m>||2 



m=l 



'm/ 112 



m=l 
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p 2"-l f 

Tl=l \ h,n=0 ^ 



df{x) 



dXr. 



> 6 



> ij \/l-2"(2-"|csc(7rA/i(5)|)2 



m=l 



= (l-2-"csc2(^A//5)f/2 
> (2 + e)/3, 



by 



By the triangle inequality, 

||P|X)I|2 > WPUQFMh - II^'^QFt|V'd)||2 - \\PUQFT\i^D)\\2- 

Since Uqft is an isometry and P is a projection and therefore a contraction, 

„^, M, 2 + e 1-e 1-e 
||P|X)I|2>^^ 5 ^ = e- 



□ 



6 Some implementation and efficiency considerations 

Theorem n established that the algorithm A(n, i/. A, /i; can perform at any required 
level of precision, given suitable operating parameters. The algorithm consists of two 
quantum Fourier transforms, a phase rotation operator, and the two operations [/+ and 
Uf together with their inverses. 

Because we have restricted the sampling grid side-length to powers of two, the quan- 
tum Fourier transform is easily computed. The standard quantum Fourier transform 
in a 2"-dimensional state space uses just n Hadamard gates and n{n — l)/2 controlled 
phase rotation gates; see for details. The p-dimensional quantum Fourier transform 
Uqft required by the gradient estimation algorithm is simply the pih. tensor power of 
the standard quantum Fourier transform, meaning that it can be implemented by ap- 
plying the standard quantum Fourier transform, simultaneously but independently, to 
the p factors of Q. 

The difficulty of implementing the phase rotation operator Ur depends on the data 
storage method used for function values, i.e., on (7^, Bn, Cr). Implementation is straight- 
forward if a binary fixed-point representation is used, that is, if TZ is the state space of 
a system of say bits, and the bit sequence 

(rAr_i,...,ri,ro) 

represents the value oq + ai Ylk=o 2'^?'fci for some real constants ao and oi. In this case 
we can simply pass each bit independently through a phase rotation gate, with matrix 
representation 

I Q g27riAai2'= ) ' 

these phase rotation gates are similar to, but simpler than, the controlled phase rotation 
gates used in the quantum Fourier transform. 
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The operators C/+ and Uf and their inverses simply perform machine arithmetic. 
The operators and C/J^^ each involve one multiplication and one addition per do- 
main dimension. The complexity of these operations in gate operations depends on the 
precision required. 

The computational complexity of the operator Uj is entirely dependent on the given 
function /. It is usual in complexity analyses of computations involving a black-box 
function / to assume that evaluations of / will be the dominating cost, measuring 
complexity by counting function evaluations. By that measure, the gradient estimation 
algorithm scores very well, as it requires two oracle operations, Uf and UJ^, the latter 
having presumably the same complexity as the former. (In fact, recall that Section |31 
assumed B-ji to be a group; if this group is taken to be Z2 , that is, if the computed 
value is stored in TZ using the XOR operation, then Uy^ is just Uf.) 

Of course, in order to perform at the required level of precision, we may require very 
great accuracy in the evaluations of /, and in the other computations. Note that this is 
a universal feature of machine computation. 

7 Conclusion 

Theorem H shows that the gradient of a real- valued multivariate function can be evalu- 
ated to any required accuracy using just two function evaluations. As with any digital 
computation, increased accuracy in the answer requries increased precision during the 
computation. Thus the quantum complexity of the gradient estimation problem is con- 
stant in dimension, which compares favourably with the classical complexity, which is 
linear in dimension, for very high-dimensional functions. 
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